Embedded Systems

MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

by Sven Teufel, Jörg Gamerdinger, Georg Volk, and Oliver Bring­mann
In 2024 IEEE In­tel­li­gent Trans­porta­tion Sys­tems Con­fer­ence (IEEE ITSC 2024), 2024.

Key­words: Col­lec­tive Per­cep­tion, Data Fu­sion, Li­DAR-Based Ob­ject De­tec­tion

Ab­stract

The safe op­er­a­tion of au­to­mated ve­hi­cles de­pends on their abil­ity to per­ceive the en­vi­ron­ment com­pre­hen­sively. How­ever, oc­clu­sion, sen­sor range, and en­vi­ron­men­tal fac­tors limit their per­cep­tion ca­pa­bil­i­ties. To over­come these lim­i­ta­tions, col­lec­tive per­cep­tion en­ables ve­hi­cles to ex­change in­for­ma­tion. How­ever, fus­ing this ex­changed in­for­ma­tion is a chal­leng­ing task. Early fu­sion ap­proaches re­quire large amounts of band­width, while in­ter­me­di­ate fu­sion ap­proaches face in­ter­change­abil­ity is­sues. Late fu­sion of shared de­tec­tions is cur­rently the only fea­si­ble ap­proach. How­ever, it often re­sults in in­fe­rior per­for­mance due to in­for­ma­tion loss. To ad­dress this issue, we pro­pose MR3D-Net, a dy­namic multi-res­o­lu­tion 3D sparse voxel grid fu­sion back­bone ar­chi­tec­ture for Li­DAR-based col­lec­tive per­cep­tion. We show that sparse voxel grids at vary­ing res­o­lu­tions pro­vide a mean­ing­ful and com­pact en­vi­ron­ment rep­re­sen­ta­tion that can adapt to the com­mu­ni­ca­tion band­width. MR3D-Net achieves state-of-the-art per­for­mance on the OPV2V 3D ob­ject de­tec­tion bench­mark while re­duc­ing the re­quired band­width by up to 94% com­pared to early fu­sion. Code is avail­able at https://​github.​com/​ekut-​es/​MR3D-​Net